There is no foundationless answer to this question. So let’s take some foundations from the Belmont report and seek to ensure:
Unfortunately, operationalizing these requires further ethical theories. Let’s assume that (1) is operationalized by informed consent (a very liberal idea). We are a bit at sea for (2) and (3) (the Belmont report suggests something like a utilitarian solution).
The major focus on (1) by IRBs might follow from the view that if subjects consent, then they endorse the ethical calculations made for 2 and 3 — they think that it is good and fair.
This is a little tricky, though, since the study may not be good or fair because of implications for non-subjects.
The problem is that many (many) field experiments have nothing like informed consent.
For example, whether the government builds a school in your village, whether an ad appears on your favorite radio show, and so on.
Consider three cases:
Consider three cases:
In all cases, there is no consent given by subjects.
In 2 and 3, the treatment is possibly harmful for subjects, and the results might also be harmful. But even in case 1, there could be major unintended harmful consequences.
In cases 1 and 3, however, the “intervention” is within the sphere of normal activities for the implementer.
Sometimes it is possible to use this point of difference to make a “spheres of ethics” argument for “embedded experimentation.”
Spheres of Ethics Argument: Experimental research that involves manipulations that are not normally appropriate for researchers may nevertheless be ethical if:
Difficulty with this argument:
Otherwise keep focus on consent and desist if this is not possible
Experimental researchers are deeply engaged in the movement towards more transparency social science research.
Contentious issues (mostly):
Experimental researchers are deeply engaged in the movement towards more transparency social science research.
Contentious issues (mostly):
Data. How soon should you make your data available? My view: as soon as possibe. Along with working papers and before publication. Before it affects policy in any case. Own the ideas not the data.
Where should you make your data available? Dataverse is focal for political science. Not personal website (mea culpa)
What data should you make available? Disagreement is over how raw your data should be. My view: as raw as you can but at least post cleaning and pre-manipulation.
Experimental researchers are deeply engaged in the movement towards more transparency social science research.
Contentious issues (mostly):
Should you register?: Hard to find reasons against. But case strongest in testing phase rather than exploratory phase.
Registration: When should you register? My view: Before treatment assignment. (Not just before analysis, mea culpa)
Registration: Should you deviate from an preanalysis plan if you change your mind about optimal estimation strategies. My view: Yes, but make the case and describe both sets of results.
Experimental researchers are deeply engaged in the movement towards more transparency social science research.
Contentious issues (mostly):
Registration: When should you register? My view: Before treatment assignment. (Not just before analysis, mea culpa)
Registration: Should you deviate from an preanalysis plan if you change your mind about optimal estimation strategies. My view: Yes, but make the case and describe both sets of results.
File drawer bias (Publication bias)
Analysis bias (Fishing)
– Say in truth \(X\) affects \(Y\) in 50% of cases.
– Researchers conduct multiple excellent studies. But they only write up the 50% that produce “positive” results.
– Even if each individual study is indisputably correct, the account in the research record – that X affects Y in 100% of cases – will be wrong.
– Say in truth \(X\) affects \(Y\) in 50% of cases.
– Researchers conduct multiple excellent studies. But they only write up the 50% that produce “positive” results.
– Even if each individual study is indisputably correct, the account in the research record – that X affects Y in 100% of cases – will be wrong.
Exacerbated by:
– Publication bias – the positive results get published
– Citation bias – the positive results get read and cited
– Chatter bias – the positive results gets blogged, tweeted and TEDed.
– Say in truth \(X\) affects \(Y\) in 50% of cases.
– But say that researchers enjoy discretion to select measures for \(X\) or \(Y\), or enjoy discretion to select statistical models after seeing \(X\) and \(Y\) in each case.
– Then, with enough discretion, 100% of analyses may report positive effects, even if all studies get published.
– Say in truth \(X\) affects \(Y\) in 50% of cases.
– But say that researchers enjoy discretion to select measures for \(X\) or \(Y\), or enjoy discretion to select statistical models after seeing \(X\) and \(Y\) in each case.
– Then, with enough discretion, 100% of analyses may report positive effects, even if all studies get published.
– Try the exact fishy test An Exact Fishy Test (https://macartan.shinyapps.io/fish/)
– What’s the problem with this test?
When your conclusions do not really depend on the data
Eg – some evidence will always support your proposition – some interpretation of evidence will always support your proposition
Knowing the mapping from data to inference in advance gives a handle on the false positive rate.
Likelihoods
| $K_1$ = No | $K_1$ = Yes | All | |
|---|---|---|---|
| $K_2$ = No | 0.9 | 0.05 | 0.95 |
| $K_2$ = Yes | 0.05 | 0 | 0.05 |
| All | 0.95 | 0.05 | 1 |
| $K_1$ = No | $K_1$ = Yes | All | |
|---|---|---|---|
| $K_2$ = No | 0 | 0.05 | 0.05 |
| $K_2$ = Yes | 0.05 | 0.9 | 0.95 |
| All | 0.05 | 0.95 | 1 |
Likelihoods
| $K_1$ = No | $K_1$ = Yes | All | |
|---|---|---|---|
| $K_2$ = No | 0.9 | 0.05 | 0.95 |
| $K_2$ = Yes | 0.05 | 0 | 0.05 |
| All | 0.95 | 0.05 | 1 |
| $K_1$ = No | $K_1$ = Yes | All | |
|---|---|---|---|
| $K_2$ = No | 0 | 0.05 | 0.05 |
| $K_2$ = Yes | 0.05 | 0.9 | 0.95 |
| All | 0.05 | 0.95 | 1 |
Source: Gerber and Malhotra
Implications are:
Summary: we do not know when we can or cannot trust claims made by researchers.
[Not a tradition specific claim]
Simple idea:
– It’s about communication: – just say what you are planning on doing before you do it – if you don’t have a plan, say that – If you do things differently from what you were planning to do, say that
Bells and whistles
– To be really useful a registry would have to have some credibility, some searchability, and some consistency in fields.
Elements:
Make it a facility
Non-mandatory
Non-binding
But comprehensive
Report whether registered or not
Report changes in plans
For discussion: but claims of “tests” seem like a good start
Center for Open Science Badges
Notations PR (peer review certified), DE (data exist), and TC (transparent changes)
Center for Open Science Badges
– Hard form – Medical sciences for RCTs – Soft form – Medical sciences for observational studies
– AEA for RCTs – APSA?
Funder led model (more mandatory) – RIDIE, NSF?
Bottom up model? – Eg established by APSA sections; CQRM, PolMeth, Experiments? No formal journal recognition.
But even the simple idea is not everywhere welcome. There are many worries and some myths.
Fishing can happen in very subtle ways, and may seem natural and justifiable.
Example:
– I am interested in whether more democratic institutions result in better educational outcomes. – I examine the relationship between institutions and literacy and between institutions and school attendance. – The attendance measure is significant and the literacy one is not. Puzzled, I look more carefully at the literacy measure and see various outliers and indications of measurement error. As I think more I realize too that literacy is a slow moving variable and may not be the best measure anyhow. I move forward and start to analyze the attendance measure only, perhaps conducting new tests, albeit with the same data.
Our journal review process is largely organized around advising researchers how to adjust analysis in light of findings in the data.
Frequentists can do it
Bayesians can do it too.
Qualitative researchers can also do it.
You can even do it with descriptive statistics
The key distinction is between prospective and retrospective studies.
Not between experimental and observational studies.
A reason (from the medical literature) why registration is especially important for experiments: because you owe it to subjects
A reason why registration is less important for experiments: because it is more likely that the intended analysis is implied by the design in an experimental study. Researcher degrees of freedom may be greatest for observational qualitative analyses.
– It does shift preparation of analyses forward – And it also can increase the burden of developing analyses plans even for projects that don’t work. But that is in part, the point.
In neither case would the creation of a registration facility prevent exploration.
What it might do is make it less credible for someone to claim that they have tested a proposition when in fact the proposition was developed using the data used to test it.
Registration communicates when researchers are angage in exploration or not. We love exploration and should be proud of it.
Does registering analyses of historical data make sense?
The problem is not just that researchers might have already seen the testing data; but that they have seen data that is correlated with it.
Consider historical proposition H. – Say we start with a prior of .5 that H is true. – Say that if H is true then we observe K1 with probability 0.8 but if it is false we observe K1 with probability 0.2 (“double decisiveness”) – Similarly if H is true then we observe K2 with probability 0.8 but if it is false we observe K2 with probability 0.2 (“double decisiveness” again)
Say we observe K1 (some collection of facts)
We then update our belief in H…
Our updated belief is: \[\Pr(H|K1) = \Pr(K1|H)\Pr(H)/\Pr(K1) = \frac{.8*.5}{.8*.5+.2*.5} = 80\%\]
We are now 80% confident in proposition H.
We decide to look for evidence K2. And we find it!
Our posterior is now: \[\Pr(H|K2) = \Pr(K2|H)\Pr(H)/\Pr(K2) =.8*.8/(.8*.8+.2*.2) = 94\%\]
Or is it?
\[\Pr(H|K1) = \Pr(K1|H)\Pr(H)/\Pr(K1) = .8*.5/(.8*.5+.2*.5) = 80\%\]
We are now 80% confident in proposition H.
We decide to look for evidence K2. And we find it!
Our posterior is now:
\[Pr(H|K2) = Pr(K2|H)Pr(H)/Pr(K2) =.8*.8/(.8*.8+.2*.2) = 94%\]
What if there are correlated probabilities?
Then
\[\Pr(H | K1 \& K2) = .76 \times .5/(.76 \times .5 + .16 \times .5) = 83\%\]
In a sense the fishing has already happened.
How so?
Say the proposition is FALSE but K1 is still observed
A decision is then made to seek “new data” K2
Now K2 will be observed with 80% probability even though H is false
Naïve inference (using a prior of 80% due to K1): 94% if K2; 50% if not K2
Inference if K1 used to decide on search for K2 but prior is “reset” to .5 80% if K2; 20% if not K2
Sophisticated inference: 83% if K2; 50% if not K2
This sophisticated inference is unchanged if you take explicit account of the fact that searching for K2 was conditional on K1; either way it is still \(\Pr(H | K1, K2)\).
It requires assessing the probability of knowing what you know now and finding out what you will find, if the proposition is true or false.
Naïve inference (using a prior of 80% due to K1): 94% if K2; 50% if not K2
Inference if K1 used to decide on search for K2 but prior is “reset” to .5 80% if K2; 20% if not K2
Sophisticated inference: 83% if K2; 50% if not K2
This sophisticated inference is unchanged if you take explicit account of the fact that searching for K2 was conditional on K1; either way it is still Pr(H | K1, K2).
Can such beliefs be elicited? Perhaps.
Incentives and strategies